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Abstract 

o 

Bipartite Correlation clustering is the problem of generating a set of disjoint bi-cliqucs on a 
set of nodes while minimizing the symmetric difference to a bipartite input graph. The number 
or size of the output clusters is not constrained in any way. 

The best known approximation algorithm for this problem gives a factor of 11 J]] This re- 

(~*\ suit and all previous ones involve solving large linear or semi-definite programs which become 

^-j- prohibitive even for modestly sized tasks. In this paper we present an improved factor 4 ap- 

proximation algorithm to this problem using a simple combinatorial algorithm which does not 
require solving large convex programs. 

The analysis extends a method developed by Ailon, Charikar and Alantha in 2008, where 

£^ a randomized pivoting algorithm was analyzed for obtaining a 3-approximation algorithm for 

Correlation Clustering, which is the same problem on graphs (not bipartite). The analysis for 

i ^ i Correlation Clustering there required defining events for structures containing 3 vertices and 

using the probability of these events to produce a feasible solution to a dual of a certain natural 
LP bounding the optimal cost. 

— It is tempting here to use sets of 4 vertices, which are the smallest structures for which 

contradictions arise for Bipartite Correlation Clustering. This simple idea, however, appears 
to be evasive. We show that, by modifying the LP, we can analyze algorithms which take into 
consideration subgraph structures of unbounded size. We believe our techniques are interesting 
in their own right, and may be used for other problems as well. 
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A previously claimed 4- approximation algorithm [T] is erroneous, as we show in the appendix. 



1 Introduction 



Bipartite Correlation Clustering (BCC) is a problem in which the input is a bipartite graph and 
the output is a set of disjoint clusters covering the graph nodesj^] A cluster may contain nodes from 
either side of the graph, but it may also contain nodes from only one side. We think of a cluster as 
a bi-clique connecting all the elements from its left and right counterparts. An output clustering is 
hence a union of bi-cliques covering the input node set. The cost of the solution is the symmetric 
difference between the input and the output. Equivalently, any pair of vertices, one on the left and 
one of the right, will incur a unit cost if either (1) an edge connects them but the output clustering 
separates them in distinct clusters, or (2) no edge connects them but the output clustering puts 
them in the same cluster. The objective is to minimize this cost. 

This notion of clustering is natural when the number of clusters and their size are not known, 
and the graph relations are bipartite by nature. It was studied in context of molecular biology, 
specifically, in gene expression data analysis (for example [3]). Other examples for bipartite data 
abound. In collaborative filtering and recommender systems interactions are given between users 
and items [3], for example, raters vs. movies/songs. Other examples may include images vs. user 
generated tags and search engine queries vs. search results. 

BCC is a bipartite version of the more well known Correlation Clustering (CC), introduced by 
Bansal, Blum and Chawla [S], where the objective is to cover an input set of nodes with disjoint 
cliques (clusters) minimizing the symmetric difference with a given edge set over these nodes. One 
motivation for BCC, which also applies to our setting, is a 2-stage clustering approach in which 
one (i) applies binary classification machine-learning methods to predict pairs of nodes that should 
be clustered together, and (ii) uses the learned classifier, applied to all pairs, as input to BCC. 
Assuming there is a correct clustering of the data and that the above binary classifier has some 
bounded error rate with respect to that ground truth, we can recover, using an algorithm for CC 
(or, BCC in our bipartite case) a clustering of the data which is provably close to the true clustering 
(see HH). 

Another motivation is the alleviation of the need to specify the number of output clusters, 
as often needed in clustering settings such as /c-means or fc-median. The treatment of clustering 
problems as CC or BCC should be compared to their predating (by decades) statistical theory of 
record linkage where, in a typical application, one wishes to identify duplicate records in a database 
riddled with human errors. The number of clusters is clearly unknown. In fact, the original record 
linkage literature [7] considered the bipartite case, a typical example being two government agencies 
cross- validating large databases of population information. 

Bansal et. al [5] gave a c ~ 10 4 factor for approximating CC running in time 0(n 2 ) where n is the 
number of nodes in the graph. Later, Demaine et. al [8j gave a 0(log(n)) approximation algorithm 
for an incomplete version of CC, relying on solving an LP and rounding its solution by employing a 
region growing procedure. By incomplete we mean that only a subset of the node pairs participate 

2 Here we consider the unweighted case, although a weighted version can be easily obtained from our analysis. 
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in the symmetric difference cost calculation^ BCC is, in fact, a special case of incomplete CC, 
in which the non-participating node pairs lie on the same side of the graph. Charikar et. al [9] 
provide a 4-approximation algorithm for CC, and another 0(log n)-approximation algorithm for 
the incomplete case. Later, Ailon et. al [JU] provided a 2.5-approximation algorithm for CC based 
on rounding an LP. They also provide a simpler 3-approximation algorithm, QuickCluster, which 
runs in time linear in the number of edges of the graph. In [llj it was argued that QuickCluster 
runs in expected time 0(n + cost{OPT)). 

Van Zuylen et. al |12j provided de-randomization for the algorithms presented in [10] with no 
compromise in the approximation guarantees. Mathieu and Schudy in [13] considered the planted 
graph version, in which the input is a noisy version of a union-of-cliques graph, and show that a 
PTAS is possible for this setting. Also, Giotis et. al |14] and independently using other techniques, 
Karpinski et. al [15] gave a PTAS for the CC case in which the number of clusters is constant. 

Amit |16] was the first to address BCC directly. She proved its NP-hardness and gave a constant 
11-approximation algorithm based on rounding a linear programming in the spirit of Charikar et. 
al's [9] algorithm for CC. 

It is worth noting that in [1] a 4-approximation algorithm for BCC was presented and analyzed. 
The presented algorithm is incorrect (we give a counter example in the paper) but their attempt 
to use arguments from |1U| is an excellent one. We will show that an extension of the method in 
[TO] is needed. 

1.1 Our Results 

Our main result, requiring a considerable development of previous techniques, is a randomized 
expected 4-approximation algorithm, PivotBiCluster. 

To explain how we attain it, we recall the method of Ailon et. al |10j . The algorithm for CC 
presented there is as follows (we concentrate on the unweighted case). Choose a random vertex, 
and form a cluster with its neighbors. Remove the cluster from the graph, and repeat until the 
graph is empty. This random-greedy algorithm returns a solution with cost at most 3-times that 
of the optimal solution, on expectation. The analysis was done by noticing that each cost element 
is naturally related to a contradiction structure containing 3 vertices and exactly 2 edges between 
them. This structure is, incidentally, the minimal structure forcing any solution to pay. In other 
words, the locations in which any clustering errs must hit the set of contradicting structures. A 
corresponding hitting set LP lower bounding the optimal solution was defined to capture this simple 
observation, and a feasible solution was then conveniently assigned to its dual using probabilities 
arising in the algorithm probability space. 

It is tempting here to consider the corresponding minimal contradiction structure for BCC, 
namely a set of 4 vertices, 2 on each side, with exactly 3 edges between them. Unfortunately, this 
idea turned out to be evasive (a proposed solution attempting this [lj has a counter example which 
we describe and analyze in Appendix [A] and is hence incorrect) . In our analysis we resorted to 

3 In some of the literature, CC refers to the much harder incomplete version, and "CC in complete graphs" is used 
for the version we have described here. 
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contradiction structures of unbounded size. Such a structure consists of two vertices of the 

left side and two sets of vertices N% , N2 on the right hand side such that iVj is contained in the 
neighborhood of l\ for i = 1,2, N\ n A^2 7^ and Ni 7^ iV~2. We define a hitting LP as we did earlier, 
this time of possibly exponential size, and analyze its dual in tandem with a carefully constructed 
random-greedy algorithm. As this analysis sketch suggests, the algorithm is not symmetrical with 
respect to the right and left side of the input. Indeed, at each round it chooses a random pivot 
vertex on the left, constructs a cluster with its right hand side neighbors, and then for each other 
vertex on the left hand side makes a randomized decision whether to join the new cluster based on 
the intersection pattern of its neighborhood with the pivot's neighborhood. 

1.2 Paper Structure 

We start with basic notation in Section [2] We then present our main algorithm in Section |3j 
followed by its analysis in Section [4j We discuss future work in Section [5] 

2 Notation 

Before describing the framework we give some general facts and notations. Let the input graph be 
G = (L, R, E) where L and R are the sets of left and right nodes and E be a subset of L x R. Each 
element (£,r) £ L x R will be referred to as a pair. 

A solution to our combinatorial problem is a clustering C\, C2, ■ ■ ■ , C m of the set L U R. We 
identify such a clustering with a bipartite graph B = (L, R, Eb) for which (£, r) £ Eb if and only 
if I £ L and r £ R are in the same cluster d for some i. Note that given B, we are unable to 
identify clusters contained exclusively in L (or R), but this will not affect the cost, so we adopt the 
convention that single-side clusters are always singletons. 

We will say that a pair e = (£, r) is erroneous if e £ (E \ Eb) U (Eb \ E). For convenience, let 
xg,b be the indicator function for the erroneous pair set, i.e., XG*,s(e) = 1 if e is erroneous and 
otherwise. We will also simply use x(e) when it is obvious to which graph G and clustering B it 
refers. The cost of a clustering solution is defined to be costc(-B) = ^ e eLx.R x g,b{z)- Similarly, 
we will use cost(-B) = Yle&LxR x ( e ) wri en G is clear from the context, Let N(£) = {r\(£, r) £ E} be 
the set of all right nodes adjacent to £. 

It will be convenient for what follows to define a tuple. We define a tuple T to be T = 
(£?,{%, RlRl 2 ,RT) where £ L, 1% ± Rj C N(£?) \ N{%), R T 2 C N{%) \ N{£[) 

and R^ 2 — ^(^2) ^ N(£j). In what follows, we may omit the superscript of T. Given a 
tuple T = (£f , £2, Rj, Rj 2 ) 1^2)1 we define the conjugate tuple T = (£j , £ 2 , Rj , R121 $2) = 
(1%, £\,R\, RT^ R i)- Note that f = T. 
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£2 joins w.p. 1 £2 becomes a singleton w.p. 1 £2 joins w.p. 2/3 £2 becomes a singleton w.p. 1/2 



Figure 1: Four example cases in which £2 either joins the cluster created by £\ or becomes a 
singleton. In the two right most examples, with the remaining probability nothing is decided about 
£2. 

3 The Algorithm 

We now describe our algorithm PivotBiCluster. The algorithm is sequential. In every cycle it 
creates one cluster and possibly many singletons, all of which are removed from the graph before 
continuing to the next iteration. Abusing notation, by N(£) we mean, in the algorithm's description, 
all the neighbors of £ £ L which have not yet been removed from the graph. 

Every such cycle performs two phases. In the first phase, PivotBiCluster picks a node on the left 
side uniformly at random, £1, and forms a new cluster C = {£±} U N(£i). This will be referred to as 
the £i-phase and £\ will be referred to as the left center of the cluster. In the second phase, denoted 
as the ^2-sub-phase corresponding to the £i-phase, the algorithm iterates over all other remaining 
left nodes, £2, and decides either to (1) append them to C, (2) turn them into singletons, or (3) do 
nothing. We now explain how to make this decision, let Ri = N(£i) \ Nfa), R2 = Nfa) \ N(£i) 
and R h2 = N(£i) D N(£ 2 ). With probability min{JM, 1} do one of two things: (1) If \R lj2 \ > \Rx\ 
append £2 to C, and otherwise (2) (if |i?i,2| < \Ri\), turn £2 into a singleton. In the remaining 
probability, (3) do nothing for £2, leaving it in the graph for future iterations. Examples for cases 
the algorithm encounters for different ratios of Ri, Ri,2, and R2 are given in Figure [3j 

Theorem 3.1. Algorithm PivotBiCluster returns a solution with expected cost at most 4 that of 
the optimal solution. 

4 Algorithm Analysis 

We start by describing bad events. This will help us relate the expected cost of the algorithm to a 
sum of event probabilities and expected consequent costs. 

Definition 4.1. We say that a bad event, Xt, happens to the tuple T = (£j , £\ R^ , Rj ' 2 , R^) if 

during the execution of PivotBiCluster, £f was chosen to be a left center while £\ was still in the 
graph, and at that moment, Rj = N(£j)\N(£ 2 r ), R^ 2 = N(££)nN(t%), and R^ = N(£%)\N(£^). 
(We refer by N(-) here to the neighborhood function in a particular moment of the algorithm 
execution.) 
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If a bad event X? happens to tuple T we "color" the following pairs with color T : 



{(4>x) 

{(4> lj2 ) 



ri G flf}, 



• {(^2 , ^2) : ^2 G -R^l oru y ^ we decide to associate £% to 's cluster, or if we decide to make 
£2 a singleton during the ^2-sub-phase corresponding to the ^i-phase. 

Lemma 4.1. During the execution of PivotBiCluster each pair (£,r) E L x R is colored at most 
once, and each pair on which the output errs is colored exactly once. 

Proof. For the first part, we show that pairs are colored at most once. A pair (£, r) can only be 
colored during an ^-sub-phases with respect to some ^i-phase, if I = £2- Clearly, this will only 
happen in one £i-phase, as every time a pair is colored either £2 or r (or both) are removed from 
the graph. Indeed, either r £ RiDRi^ in which case r is removed, or r G R2, but then £ is removed 
since it either joins the cluster created by £\ or becomes a singleton. 

For the second part, note that the only pairs which are not colored are between left centers 
(during ^i-phases) and right nodes in the graph at that time. On all these pairs the algorithm does 
not err. ■ 

We denote by qx the probability that event Xt occurs and by cost(T) the number of erroneous 



pairs that are colored by Xt- From Lemma 4.1 we get the following: 
Corollary 4.1. 



E[cost[PivotBiCluster}\ = E 



.eeLxR 



E 



^cost(T) 



^2q T -E[cost(T)|Xr] 



T 



Note: In what follows we use the terms erroneous pairs and violating pairs or violation pairs 
interchangingly, referring to pairs on which the algorithm incurs a unit of cost. 



4.1 Contradicting Structures 

We now identify bad structures in the graph for which every output must incur some cost. In the 
case of BCC the minimal such structures are "bad squares": A set of four nodes, two on each side, 
between which there are only three edges. We make the trivial observation that any clustering 
B must make at least one mistake on any such bad square, s (we think of s as the set of 4 pairs 
connecting its two left nodes and two right nodes). Any clustering solution's violating pair set must 
hit these squares. Let S denote the set of all bad squares in the input graph G. 

It is not enough to concentrate on squares in our analysis. Indeed, at an ^2-sub-phase, decisions 
are made based on the intersection pattern of the current neighborhoods of £2 and £\ - a possibly 
unbounded structure. The tuples now come in handy. 

Consider tuple T = (£j , £% , R{ , R{ 2l R^) for which \Rj >2 \ > and \R%\ > . Notice that 
for every selection of r2 G i?^, and ri >2 G Rj 2 the tuple contains the bad square induced by 
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{h,f2, £2, ^1,2}- Note that there may also be bad squares {^2,^1,^1,^1,2} for every n G i?f and 
r i,2 £ RJ2 but these will be associated to the conjugate tuple T = £\ , R\, R121 

For each tuple we can write a corresponding linear constraint on the function {x(e) : e £ Lx R}, 
indicating, as we explained above, the pairs for which the algorithm errs. A tuple constraint is the 
sum of the constraints of the squares it is associated with, where a constraint for square s is simply 
defined as J2 e &s x ( e ) — 1- Since each tuple corresponds to \Rj\ ■ \Rj^\ bad squares, we get the 
following constraint: 

V T 1 £ (^i> 2 + a*f, ri, a + Xgr, n + ^,n, 2 ) = 

rzeRj ,n,2eR{ ' 2 

£ \ R h\ • + ^I> 2 ) + 2 1^2 I • (Z/f,ri,2 + X ^,ri, 2 ) ^ 1^2 I " 1^2 1 

r 2 6-Rf n, 2 ei?^ 2 
The following linear program hence provides a lower bound for the optimal solution: 



LP = min x ( e ) 
eeLxR 

S - L VT i^Ti £ ( x ej,r 2 + X q,r 2 ) + T^TT 2 (^f ,r li2 + ^,n, a ) ^ 1 
2 raSRj 1 ' 2 ' r lj2 ei?f 2 

Notice that all the constraints in this program are sums of square constraints. This means that 
the program is equivalent to one in which only square constraints are present. Our formulation, 
however, allows the definition of useful dual variables corresponding to each tuple T. The dual 
program is as follows: 



DP = max £ P( T ) 

T 

s.t.V (£,r)€E: £ t^PP) + £ TW~A T ) + £ ji^ T ) ^ 1 



T-.eT = t jreR T I 2 1 T-.el=e,reRl 2 1 1>2 ' T-.eJ=e,reRj 2 ' 1,21 



andV(£,r)££: £ _L/3(T) < 1 
T-.ef=e,reRT ' 2 ' 



4.2 Obtaining the Competitive Analysis 



We now relate the expected cost of the algorithm on each tuple to a feasible solution for DP. We 
remind the reader that qr denotes the probability that a bad event Xt happens to tuple T. 

Lemma 4.2. Let /3(T) = ax ■ qr • min{|i?f 2 |, , u>/ien 
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. L m 1 

aT >min{\Rl 2 l\RT\}+mm{\Rl 2 \,\R%\}j 

then (3 is a feasible solution to DP. 

In other words, for every edge e = (£, r) £ E: 

E w\^ + E w~\ m + E n^ 7 ^ 1 - (1) 

T s.t (%=l,relig 2 T s.t £f=*,re.R£ 2 ' 1 ' 2 ' T s.t e%=e,r£R^ 2 1,2 

^4n(i /or every pair e = (£, r) E: 

E j4|0CO ^ 1 • (2) 
t s.t ef=e,reRT 1 21 

Proof. First, notice that given a pair e = (£, r) £ E each tuple T can appear at most in one of the 
sums in the LHS of Q. Denote by X e> x the event that the edge e is colored with color T. We 
distinguish between two cases. 



1. Consider T appearing in the first sum of the LHS of Q, meaning that i\ = i and r 6 R\. 
We distinguish between two sub-cases. 

• If \R\2\ > |-Ril> e i s c °l° re d with color T if £3 joined the cluster of if. This happens, 
conditioned on Xt, with probability Pr[X e) r|Xr] = min < , 1 >, 



v 1^ I 

?T I / I pTl 



• if 2 1 < l-^i I we color e with color T if £2 was isolated, which happens with probability 
Pr[X e , T |X T ] = min{^, 1} as well. 

Thus, T contributes the following expression to the sum: 



1 r(3(T) = — lfra T ■ qr ■ ^\m.{\R\ 2 \,\R\\\ < q T • min ■ ' 



= Pr[X T ] Pr[X e , T \X T ] = Pr[X eiT }. 

2. T contributes to the second or third sum in the LHS of 0. By definition of the conjugate 
T, the following holds: 



E j]h\ m+ E jih\ m= E T^r-Am+m)- 

t s.t q=i,reRf 2 1 1,21 t s.t eT=e,reii[ 2 l L 2 ' T s.t £'[=e,reR'[ 2 l 1 ' 2 ' 

(3) 

Therefore it is sufficient to bound the contribution of each T to the RHS of ([3]). We may 
therefore focus on tuples T for which if £ = if and r £ -RiV Consider a moment in the 
algorithm's execution in which both if and l\ were still present in the graph, Rj = N(£j) \ 



7 



N{Zl), R% 2 = N(£j) n N(J%), R\ = N(J%) \ N(£j) and one of £j , £j was chosen to be a left 
centerj^] Either one of i\ and £ 2 had the same probability to be chosen. In other words: 

Vi[X T \X T U Xf] = Pi[X f \X T u X T ] , 

and hence, qx = qf- Further, notice that e = (I, r) is never colored with color T, and if event 
Xf happens then e is colored with color T with probability 1. Therefore: 



(£(T) + /3(T)) 

\ n i,2\ 



i . f \rT 



qx • min I 1, 



|i^ 2 | ^ \'mxn{\Rl 2 \^Rl\} + mm{\Rl 2 \,\Rl\}\ 

■ (min{|i^ 2 |, \Rl\] + min{|i^ 2 |, |i#|} 
< g T = ^ = PrfXf ] =Pr[X ejf ]+Pr[X e , T ]. 

Summing this all together, for every edge e £ E: 

E W\ m + £ \W-\ m+ ^ |i/3(T)<^Pr[X e)T ] 

Ts.! I^f.re^ 21 T s4 lJ=l,rdR^ 2 1 1 ' 2 ' T s.t ^ =£, rGR^ 1 1 ' 2 ' T 



By the first part of Lemma 4.1 we know that ^2 T Pr[X e) T\ is exactly the probability of the edge e 
to be colored (the sum is over probabilities of disjoint events), therefore it is at most 1, as required 
to satisfy Q. 

Now consider a pair e = (£, r) E. A tuple T contributes to Q if £j = I and r € R\ '. Since, 
as before, qx = qf and since PrLY e 7* |Xf ] = 1 (this follows from the first coloring rule described in 
the beginning of Section [4]) we obtain the following: 

E ]WrA T "> = E ^■a T -q T -m.in{\Rl 2 \ 1 \R 2 r \} 

T s.t (T=£,reRT 1 2 I T s.t i r [=£,r£R 2 2 ' 

^ E «T = E «T 

T s.t q=£,r&RT T s.t e$=£,reRf 

Pr[Xf]= E Pr ^T] 

T s.t e^=£,r&Rf f s.t e%=£,reR.f 

= J>r[X e , T ]. 

T 

From the same reason as before, this is at most 1, as required for Q. ■ 
After presenting the feasible solution to our dual program, we have left to prove that the 



4 We use the definition of N(-) which depends on the "current" state of the graph at that moment, after possibly 
removing previously created clusters. 
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expected cost of PivotBiCluster is at most 4 times the DP value of this solution. For this we need 
the following: 

Lemma 4.3. For any tuple T, 

q T ■ E[cost(T)|Xr] + Qf ■ E[cost(T)|Xr] < 4 • (/3(T) + (3(f)) . 

Proof. We consider three cases, according to the structure of T. 

Case 1. \Rj\ < \Rj 2 \, \R%\ < \Rj 2 \ (equivalently \Rj\ < \R^ 2 \, \R%\ < \R{ :2 \) ■ 

For this case, ay = atf = min < 1, |^r|^^Tj r> an d we have (recall that qT = qf) 

(3(T) + f3(f) = a T -^-(min{|^ 2 |,|^|} + min{|^ 2 |,| J Rf|}) 

= q T ■ mm{{\Rl\ + \Rj\), |i^ 2 |} > \ • q T • {\R T 2 \ + \Rj\). 

Since \Rj\ < \Rj 2 \, if event Xt happens PivotBiCluster adds £ 2 to ij 's cluster with probability 

min < -j^- > 1 ^ = 1 • Therefore the pairs colored with color T that PivotBiCluster violates are all 

the edges from £ 2 to R 2 and all the non-edges from £ 2 to Rj , namely, \R 2 \ + \Rj | edges. The 
same happens in the event Xf as the conditions on \Rj\, \Rf 2 \, and \R 2 \ are the same, and since 
\RT\ + \Rl\ = \Rl\ + \RT\. Thus, 

q T • (E[cost(T|X r )] + E[cost(f\X T )}) = q T (2 (\R^\ + \R* |)) < 4 • ((3(T) + 0(f)) . 

Case 2. \Rj \ < \Rj >2 \ < \R^\ (equivalently \Rf\ > \Rj 2 \ > \R^\) : 
Here cut = «t = rnin |l, i^t^^t 1 1> therefore, 

0(T)+P(T) = a T -g T -(min{|^ 2 |,| J R 2 P |} + min{| J R?; 2 |,| J Rf|}) 
= q T ■ min{|i^ 2 | + \Rj \, \R^\} = q T ■ \R^ 2 \. 

As < |i?f 2 |, if event X? happens PivotBiCluster adds £ 2 to £\ cluster with probability 

\R T I 1 \R T I \R T I 

~\W\ ' ^ I = WT ' therefore wrtn probability =^ the pairs colored by color T that Pivot- 
BiCluster violate are all the edges from ^ to i?^ and all the non-edges from £ 2 to Rj , and with 
probability ^1 — PivotBiCluster violates all the edges from £ 2 to Ri 2 - Thus, 

E[cost(T)|X r ] = (\R%\ + \Rj\) + f 1 

= 2-1^1 + l ^ 2 '" |i ^ 1 " 



mm 
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If the event Xf happens, as \Rj\ > \Ri 2 \ an d m hi I lj = li PivotBiCluster chooses to isolate 

^ (= if) with probability 1 and the number of pairs colored with color T that are consequently 
violated are \R^\ + |#£ 2 | = \ R T\ + \ R i,2\ ■ Thus > 

q T ■ (E[cost(T)|X T ]) + E[cost(T)|X f )]) < q T • (2|^ 2 | + \R* | + |i^ 2 |) 

< A-q T -\Rl 2 \=A-{(3(T)+(3(f)) . 

Case 3. \Rf 2 \ < \Rj\,\Rl 2 \ < \R%\ (equivalently, |i?^ 2 | < |i?f |, |i?J 2 | < 
Here, ay = a.f = \ , thus, 

P(T) + /3(T) = i ■ q T ■ (minll^l, \R^\} + min{|i^ 2 |, \Rj\}) = q T • |i^ 2 | • 

Conditioned on event Xt, as > |-Rf 2 li PivotBiCluster chooses to isolate £ 2 with probability 
min < i 1 r = TSzT- Therefore with probability PivotBiCluster colors \RT\ + \RT 2 \ pairs 

I 1-^2 I J l-"-2 I ' l-"-2 I ' 

1 — ) > PivotBiCluster colors \Rj^ 2 \ 
pairs with color T (and violated them all). We conclude that 

E[cost(T)|X t ] = I + l^al) + ^ " \ R U = ^\ R U . 

Similarly, for event Xf, as \RT\ > \RT 2 \ and min < Jj? , 1 > = , , , PivotBiCluster isolates t\ 

y \R 2 1 J l K i I 

with probability t^tt therefore colors + |i?f 2 | P a hs with color T (and violated them all). 

With probability (1 - y#) PivotBiCluster colors \Rj 2 \ pairs with color T (and violates them all). 
Thus, 

I -^1,2 I n nTl . mT n . I 1-^1 



Eic-oMlVjlA', j - + \ R U) + ( 1 - ] l^ 2 l = 2|i2?; 2 |. 



And therefore 

«r • (E[cost(T)|X t ] + E[cost(T)|A>]) = 4 • g r ■ |i^ 2 | = 4 • (/3(T) + /3(f)) 



By Corollary 4.1 



E[PivotBiCluster] = ^ Pr[A>] • E[cost(T)|X T ] 

T 

1 ^ (Pr[X T ] • E[cost(T)|X T ] + Pt[X t ] • E[cost(f ]) . 



2 
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By Lemma |4~3| the above RHS is at most 2 • Er(/ 3 ( T ) + PC 1 )) = 4 ' Et^CO- Therefore by the 
weak duality theorem we conclude that 

E[PivotBiCluster] < 4 • ^ f3(T) < 4 • OPT. 

T 



This proves our main result Theorem 3.1 



5 Future Work 

Improving the approximation factor as well as derandomizing the algorithm (in the lines of |12j . or 
using other techniques) are interesting questions. One direction that seems promising is to devise an 
LP rounding algorithm using a variation of PivotBiCluster (in the lines of the LP-based algorithms 
in [TO]). 
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A A Counter Example for a Previously Claimed Result 

In [T] the authors claim to design and analyze a 4-approximation algorithm for BCC. Its analysis is 
based on bad squares (and not unbounded structures, as done in our analysis). Their algorithm is 
as follows: First, choose a pivot node uniformly at randomly from the left side, and cluster it with 
all its neighbors. Then, for each node on the left, if it has a neighbor in the newly created cluster, 
append it with probability 1/2. An exception is reserved for nodes whose neighbor list is identical 
that of the pivot, in which case these nodes join with probability 1. Remove the clustered nodes 
and repeat until no nodes are left in the graph. 

Unfortunately, there is an example demonstrating that the algorithm has an unbounded ap- 
proximation ratio. Consider a bipartite graph on In nodes, l\ n on the left and r\ ... n on the 
right. Let each node t{ on the left be connected to all other nodes on the right except for rj. The 
optimal clustering of this graph connects all li and rj nodes and thus has cost OPT = n. In the 
above algorithm, however, the first cluster created will include all but one of the nodes on the right 
and roughly half the left ones. This already incurs a cost of J7(n 2 ) which is a factor n worse than 
the best possible. 
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As a side note, the authors of this abstract have also tried to design an algorithm based on an 
analysis involving squares only, to no avail. 
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